Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Throughout its lifecycle, an LLM incurs significantly higher carbon emissions during inference than training. Inference requests vary in batch size, prompt length, and token generation, while cloud providers deploy heterogeneous GPU configurations to meet diverse service-level objectives. Unlike training, inference exhibits lower and highly variable hardware utilization, making equation-based carbon models unreliable. Existing network-based estimators lack accuracy, as they fail to account for the distinct prefill and decode phases, hardware-specific features, and realistic request distributions. We propose LLMCO2, a graph neural network (GNN)-based model, to improve the accuracy of LLM inference carbon footprint estimation by ~ 67% over prior approaches. Source code is available at https://github.com/fuzhenxiao/LLMCO2.more » « lessFree, publicly-accessible full text available July 1, 2026
-
Free, publicly-accessible full text available June 23, 2026
-
Zeroth-order fine-tuning eliminates explicit back-propagation and reduces memory overhead for large language models (LLMs), making it a promising approach for on-device fine-tuning tasks. However, existing memory-centric accelerators fail to fully leverage these benefits due to inefficiencies in balancing bit density, compute-in-memory capability, and endurance-retention trade-off. We present a reliability-aware, analog multi-level-cell (MLC) eDRAM-RRAM compute-in-memory (CIM) solution co-designed with zeroth-order optimization for language model fine-tuning. An RRAM-assisted eDRAM MLC programming scheme is developed, along with a process-voltage-temperature (PVT)-robust, large-sensing-window time-to-digital converter (TDC). The MLC-eDRAM integrating two-finger MOM provides 12× improvement in bit density over state-of-the-art MLC design. Another 5× density and 2× retention benefits are gained by adopting BEOL In2O3 FETs.more » « lessFree, publicly-accessible full text available May 18, 2026
-
Free, publicly-accessible full text available May 1, 2026
An official website of the United States government
